AITopics | batch reinforcement learning

Collaborating Authors

batch reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Policy Poisoning in Batch Reinforcement Learning and Control

Neural Information Processing SystemsDec-25-2025, 05:02:01 GMT

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy. The victim is a reinforcement learner / controller which first estimates the dynamics and the rewards from a batch data set, and then solves for the optimal policy with respect to the estimates. The attacker can modify the data set slightly before learning happens, and wants to force the learner into learning a target policy chosen by the attacker. We present a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control. We show that both instantiation result in a convex optimization problem on which global optimality is guaranteed, and provide analysis on attack feasibility and attack cost. Experiments show the effectiveness of policy poisoning attacks.

batch reinforcement learning, batch reinforcement learning and control, policy poisoning, (7 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

B2RL: An open-source Dataset for Building Batch Reinforcement Learning

Liu, Hsin-Yu, Fu, Xiaohan, Balaji, Bharathan, Gupta, Rajesh, Hong, Dezhi

arXiv.org Artificial IntelligenceSep-30-2022

Batch reinforcement learning (BRL) is an emerging research area in the RL community. It learns exclusively from static datasets (i.e. replay buffers) without interaction with the environment. In the offline settings, existing replay experiences are used as prior knowledge for BRL models to find the optimal policy. Thus, generating replay buffers is crucial for BRL model benchmark. In our B2RL (Building Batch RL) dataset, we collected real-world data from our building management systems, as well as buffers generated by several behavioral policies in simulation environments. We believe it could help building experts on BRL research. To the best of our knowledge, we are the first to open-source building datasets for the purpose of BRL learning.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2209.15626

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (0.50)

Industry:

Energy > Power Industry (0.46)
Construction & Engineering > HVAC (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Batch Reinforcement Learning from Crowds

Zhang, Guoxi, Kashima, Hisashi

arXiv.org Artificial IntelligenceNov-8-2021

A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for lack of reward, such as behavioral cloning, rely on optimal demonstrations collected from humans. Unfortunately, extensive expertise is required for ensuring optimality, which hinder the acquisition of large-scale data for complex tasks. This paper addresses the lack of reward in a batch reinforcement learning setting by learning a reward function from preferences. Generating preferences only requires a basic understanding of a task. Being a mental process, generating preferences is faster than performing demonstrations. So preferences can be collected at scale from non-expert humans using crowdsourcing. This paper tackles a critical challenge that emerged when collecting data from non-expert humans: the noise in preferences. A novel probabilistic model is proposed for modelling the reliability of labels, which utilizes labels collaboratively. Moreover, the proposed model smooths the estimation with a learned reward function. Evaluation on Atari datasets demonstrates the effectiveness of the proposed model, followed by an ablation study to analyze the relative importance of the proposed ideas.

annotator, reliability, reward function, (15 more...)

arXiv.org Artificial Intelligence

2111.04279

Country: Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Batch Reinforcement Learning for Smart Home Energy Management

Berlink, Heider (Universidade de Sao Paulo) | Costa, Anna HR (Universidade de Sao Paulo)

AAAI ConferencesJul-15-2015

Smart grids enhance power grids by integrating electronic equipment, communication systems and computational tools. In a smart grid, consumers can insert energy into the power grid. We propose a new energy management system (called RLbEMS) that autonomously defines a policy for selling or storing energy surplus in smart homes. This policy is achieved through Batch Reinforcement Learning with historical data about energy prices, energy generation, consumer demand and characteristics of storage systems. In practical problems, RLbEMS has learned good energy selling policies quickly and effectively. We obtained maximum gains of 20.78% and 10.64%, when compared to a Naive-greedy policy, for smart homes located in Brazil and in the USA, respectively. Another important result achieved by RLbEMS was the reduction of about 30% of peak demand, a central desideratum for smart grids.

grid, rlbems, smart home, (15 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

South America > Brazil > São Paulo (0.05)
North America > Central America (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback